Biased mmvq: minor optimization #880

ikawrakow · 2025-10-30T09:24:00Z

This PR derives from PR 16847 in mainline.

On my GPU (RTX-4080) it is a very minor improvement over the main branch (~0.5% better TG for GPT-OSS-20B-MXFP4, less for other models). But based on the discussion in the mainline PR, it may lead to larger performance gains for low memory bandwidth GPUs.

The PR also adds the -mmvq | --merge-qkv option (see #878) to llama-bench.

This reverts commit fd3757d.

Co-authored-by: Iwan Kawrakow <[email protected]>

Biased mmvq: minor optimization

bb4752d

ikawrakow merged commit fd3757d into main Oct 31, 2025

Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Nov 6, 2025

Revert "Biased mmvq: minor optimization (ikawrakow#880)"

e49462b

This reverts commit fd3757d.

Nexesenex added a commit to Nexesenex/ik_llama.cpp.nxs that referenced this pull request Nov 6, 2025

Revert "Biased mmvq: minor optimization (ikawrakow#880)"

f59c391

This reverts commit fd3757d.

ikawrakow pushed a commit that referenced this pull request Nov 11, 2025

Opt from #880 also for iqk cuda gemv

5266eee

ikawrakow added a commit that referenced this pull request Nov 11, 2025

Opt from #880 also for iqk cuda gemv (#938)

463c694

Co-authored-by: Iwan Kawrakow <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Biased mmvq: minor optimization #880

Biased mmvq: minor optimization #880

Uh oh!

ikawrakow commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Biased mmvq: minor optimization #880

Biased mmvq: minor optimization #880

Uh oh!

Conversation

ikawrakow commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants